Goto

Collaborating Authors

 Daegu



Decoupled Q-Chunking

Li, Qiyang, Park, Seohong, Levine, Sergey

arXiv.org Machine Learning

Temporal-difference (TD) methods learn state and action values efficiently by bootstrapping from their own future value predictions, but such a self-bootstrapping mechanism is prone to bootstrapping bias, where the errors in the value targets accumulate across steps and result in biased value estimates. Recent work has proposed to use chunked critics, which estimate the value of short action sequences ("chunks") rather than individual actions, speeding up value backup. However, extracting policies from chunked critics is challenging: policies must output the entire action chunk open-loop, which can be sub-optimal for environments that require policy reactivity and also challenging to model especially when the chunk length grows. Our key insight is to decouple the chunk length of the critic from that of the policy, allowing the policy to operate over shorter action chunks. We propose a novel algorithm that achieves this by optimizing the policy against a distilled critic for partial action chunks, constructed by optimistically backing up from the original chunked critic to approximate the maximum value achievable when a partial action chunk is extended to a complete one. This design retains the benefits of multi-step value propagation while sidestepping both the open-loop sub-optimality and the difficulty of learning action chunking policies for long action chunks. We evaluate our method on challenging, long-horizon offline goal-conditioned tasks and show that it reliably outperforms prior methods. Code: github.com/ColinQiyangLi/dqc.


CHyLL: Learning Continuous Neural Representations of Hybrid Systems

Teng, Sangli, Liu, Hang, Song, Jingyu, Sreenath, Koushil

arXiv.org Artificial Intelligence

Learning the flows of hybrid systems that have both continuous and discrete time dynamics is challenging. The existing method learns the dynamics in each discrete mode, which suffers from the combination of mode switching and discontinuities in the flows. In this work, we propose CHyLL (Continuous Hybrid System Learning in Latent Space), which learns a continuous neural representation of a hybrid system without trajectory segmentation, event functions, or mode switching. The key insight of CHyLL is that the reset map glues the state space at the guard surface, reformulating the state space as a piecewise smooth quotient manifold where the flow becomes spatially continuous. Building upon these insights and the embedding theorems grounded in differential topology, CHyLL concurrently learns a singularity-free neural embedding in a higher-dimensional space and the continuous flow in it. We showcase that CHyLL can accurately predict the flow of hybrid systems with superior accuracy and identify the topological invariants of the hybrid systems. Finally, we apply CHyLL to the stochastic optimal control problem.


GuideNav: User-Informed Development of a Vision-Only Robotic Navigation Assistant For Blind Travelers

Hwang, Hochul, Yang, Soowan, Monon, Jahir Sadik, Giudice, Nicholas A, Lee, Sunghoon Ivan, Biswas, Joydeep, Kim, Donghyun

arXiv.org Artificial Intelligence

While commendable progress has been made in user-centric research on mobile assistive systems for blind and low-vision (BLV) individuals, references that directly inform robot navigation design remain rare. To bridge this gap, we conducted a comprehensive human study involving interviews with 26 guide dog handlers, four white cane users, nine guide dog trainers, and one O\&M trainer, along with 15+ hours of observing guide dog-assisted walking. After de-identification, we open-sourced the dataset to promote human-centered development and informed decision-making for assistive systems for BLV people. Building on insights from this formative study, we developed GuideNav, a vision-only, teach-and-repeat navigation system. Inspired by how guide dogs are trained and assist their handlers, GuideNav autonomously repeats a path demonstrated by a sighted person using a robot. Specifically, the system constructs a topological representation of the taught route, integrates visual place recognition with temporal filtering, and employs a relative pose estimator to compute navigation actions - all without relying on costly, heavy, power-hungry sensors such as LiDAR. In field tests, GuideNav consistently achieved kilometer-scale route following across five outdoor environments, maintaining reliability despite noticeable scene variations between teach and repeat runs. A user study with 3 guide dog handlers and 1 guide dog trainer further confirmed the system's feasibility, marking (to our knowledge) the first demonstration of a quadruped mobile system retrieving a path in a manner comparable to guide dogs.


Dribble Master: Learning Agile Humanoid Dribbling Through Legged Locomotion

Wang, Zhuoheng, Zhou, Jinyin, Wu, Qi

arXiv.org Artificial Intelligence

Humanoid soccer dribbling is a highly challenging task that demands dexterous ball manipulation while maintaining dynamic balance. Traditional rule-based methods often struggle to achieve accurate ball control due to their reliance on fixed walking patterns and limited adaptability to real-time ball dynamics. To address these challenges, we propose a two-stage curriculum learning framework that enables a humanoid robot to acquire dribbling skills without explicit dynamics or predefined trajectories. In the first stage, the robot learns basic locomotion skills; in the second stage, we fine-tune the policy for agile dribbling maneuvers. We further introduce a virtual camera model in simulation that simulates the field of view and perception constraints of the real robot, enabling realistic ball perception during training. We also design heuristic rewards to encourage active sensing, promoting a broader visual range for continuous ball perception. The policy is trained in simulation and successfully transferred to a physical humanoid robot. Experiment results demonstrate that our method enables effective ball manipulation, achieving flexible and visually appealing dribbling behaviors across multiple environments. This work highlights the potential of reinforcement learning in developing agile humanoid soccer robots. Additional details and videos are available at https://zhuoheng0910.github.io/dribble-master/.

  Country: Asia > South Korea > Daegu > Daegu (0.04)
  Genre: Research Report > New Finding (0.66)
  Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Real-time Air Pollution prediction model based on Spatiotemporal Big data

Le, Van-Duc, Bui, Tien-Cuong, Cha, Sang Kyun

arXiv.org Artificial Intelligence

Air pollution is one of the most concerns for urban areas. Many countries have constructed monitoring stations to hourly collect pollution values. Recently, there is a research in Daegu city, Korea for real-time air quality monitoring via sensors installed on taxis running across the whole city. The collected data is huge (1-second interval) and in both Spatial and Temporal format. In this paper, based on this spatiotemporal Big data, we propose a real-time air pollution prediction model based on Convolutional Neural Network (CNN) algorithm for image-like Spatial distribution of air pollution. Regarding to Temporal information in the data, we introduce a combination of a Long Short-Term Memory (LSTM) unit for time series data and a Neural Network model for other air pollution impact factors such as weather conditions to build a hybrid prediction model. This model is simple in architecture but still brings good prediction ability.


SwarmDiffusion: End-To-End Traversability-Guided Diffusion for Embodiment-Agnostic Navigation of Heterogeneous Robots

Zhura, Iana, Karaf, Sausar, Batool, Faryal, Mudalige, Nipun Dhananjaya Weerakkodi, Serpiva, Valerii, Abdulkarim, Ali Alridha, Fedoseev, Aleksey, Seyidov, Didar, Amjad, Hajira, Tsetserukou, Dzmitry

arXiv.org Artificial Intelligence

Abstract--Visual traversability estimation is critical for autonomous navigation, but existing VLM-based methods rely on hand-crafted prompts, generalize poorly across embodiments, and output only traversability maps, leaving trajectory generation to slow external planners. We propose SwarmDiffusion, a lightweight end-to-end diffusion model that jointly predicts traversability and generates a feasible trajectory from a single RGB image. T o remove the need for annotated or planner-produced paths, we introduce a planner-free trajectory construction pipeline based on randomized way-point sampling, B ezier smoothing, and regularization enforcing connectivity, safety, directionality, and path thinness. This enables learning stable motion priors without demonstrations. SwarmDiffusion leverages VLM-derived supervision without prompt engineering and conditions the diffusion process on a compact embodiment state, producing physically consistent, traversable paths that transfer across different robot platforms. Across indoor environments and two embodiments (quadruped and aerial), the method achieves 80-100% navigation success and 0.09s inference, and adapts to a new robot using only 500 additional visual samples. ELIABLE indoor navigation is fundamental to a wide range of robotic applications, including warehouse automation [1], industrial inspection [2], search and rescue, and autonomous logistics. In these settings, robots must continuously reason about where they can safely move and how to plan a feasible trajectory through cluttered, unstructured, and dynamic spaces.


TacFinRay: Soft Tactile Fin-Ray Finger with Indirect Tactile Sensing for Robust Grasping

Nam, Saekwang, Deng, Bowen, Lee, Loong Yi, Rossiter, Jonathan M., Lepora, Nathan F.

arXiv.org Artificial Intelligence

Abstract--We present a tactile-sensorized Fin-Ray finger that enables simultaneous detection of contact location and indentation depth through an indirect sensing approach. A hinge mechanism is integrated between the soft Fin-Ray structure and a rigid sensing module, allowing deformation and translation information to be transferred to a bottom crossbeam upon which are an array of marker-tipped pins based on the biomimetic structure of the T acTip vision-based tactile sensor . Deformation patterns captured by an internal camera are processed using a convolutional neural network to infer contact conditions without directly sensing the finger surface. The finger design was optimized by varying pin configurations and hinge orientations, achieving 0.1 mm depth and 2 mm location-sensing accuracies. The perception demonstrated robust generalization to various indenter shapes and sizes, which was applied to a pick-and-place task under uncertain picking positions, where the tactile feedback significantly improved placement accuracy. Overall, this work provides a lightweight, flexible, and scalable tactile sensing solution suitable for soft robotic structures where the sensing needs situating away from the contact interface. I. INTRODUCTION Tactile sensing is essential for achieving dexterous manipulation in robotic hands [1], [2]. For example, to perform delicate tasks like gently grasping and placing eggs or glass plates, humanoid robots such as Figure's F.02 and Tesla's Optimus will need fingertip-mounted tactile sensors to become truly capable [3]. To enhance robotic dexterity, researchers have developed vision-based tactile sensors (VBTSs) that take advantage of recent advancements in computer vision [4]-[7].


Real-Time Structural Health Monitoring with Bayesian Neural Networks: Distinguishing Aleatoric and Epistemic Uncertainty for Digital Twin Frameworks

Cho, Hanbin, Yu, Jecheon, Moon, Hyeonbin, Yoon, Jiyoung, Lee, Junhyeong, Kim, Giyoung, Park, Jinhyoung, Ryu, Seunghwa

arXiv.org Artificial Intelligence

Reliable real-time analysis of sensor data is essential for structural health monitoring (SHM) of high-value assets, yet a major challenge is to obtain spatially resolved full-field aleatoric and epistemic uncertainties for trustworthy decision-making. We present an integrated SHM framework that combines principal component analysis (PCA), a Bayesian neural network (BNN), and Hamiltonian Monte Carlo (HMC) inference, mapping sparse strain gauge measurements onto leading PCA modes to reconstruct full-field strain distributions with uncertainty quantification. The framework was validated through cyclic four-point bending tests on carbon fiber reinforced polymer (CFRP) specimens with varying crack lengths, achieving accurate strain field reconstruction (R squared value > 0.9) while simultaneously producing real-time uncertainty fields. A key contribution is that the BNN yields robust full-field strain reconstructions from noisy experimental data with crack-induced strain singularities, while also providing explicit representations of two complementary uncertainty fields. Considered jointly in full-field form, the aleatoric and epistemic uncertainty fields make it possible to diagnose at a local level, whether low-confidence regions are driven by data-inherent issues or by model-related limitations, thereby supporting reliable decision-making. Collectively, the results demonstrate that the proposed framework advances SHM toward trustworthy digital twin deployment and risk-aware structural diagnostics.


Video2Act: A Dual-System Video Diffusion Policy with Robotic Spatio-Motional Modeling

Jia, Yueru, Liu, Jiaming, Liu, Shengbang, Zhou, Rui, Yu, Wanhe, Yan, Yuyang, Chi, Xiaowei, Guo, Yandong, Shi, Boxin, Zhang, Shanghang

arXiv.org Artificial Intelligence

Robust perception and dynamics modeling are fundamental to real-world robotic policy learning. Recent methods employ video diffusion models (VDMs) to enhance robotic policies, improving their understanding and modeling of the physical world. However, existing approaches overlook the coherent and physically consistent motion representations inherently encoded across frames in VDMs. To this end, we propose Video2Act, a framework that efficiently guides robotic action learning by explicitly integrating spatial and motion-aware representations. Building on the inherent representations of VDMs, we extract foreground boundaries and inter-frame motion variations while filtering out background noise and task-irrelevant biases. These refined representations are then used as additional conditioning inputs to a diffusion transformer (DiT) action head, enabling it to reason about what to manipulate and how to move. To mitigate inference inefficiency, we propose an asynchronous dual-system design, where the VDM functions as the slow System 2 and the DiT head as the fast System 1, working collaboratively to generate adaptive actions. By providing motion-aware conditions to System 1, Video2Act maintains stable manipulation even with low-frequency updates from the VDM. For evaluation, Video2Act surpasses previous state-of-the-art VLA methods by 7.7% in simulation and 21.7% in real-world tasks in terms of average success rate, further exhibiting strong generalization capabilities.